Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 5971 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 699.7 KiB |
| Average record size in memory | 120.0 B |
Variable types
| Numeric | 14 |
|---|
purchases is highly correlated with quantity_p and 1 other fields | High correlation |
devolutions is highly correlated with quantity_p and 3 other fields | High correlation |
recency_p is highly correlated with avg_recency_days | High correlation |
recency_d is highly correlated with invoices_d | High correlation |
quantity_p is highly correlated with purchases and 4 other fields | High correlation |
quantity_d is highly correlated with devolutions and 3 other fields | High correlation |
invoices_p is highly correlated with purchases and 1 other fields | High correlation |
invoices_d is highly correlated with recency_d and 1 other fields | High correlation |
avg_ticket is highly correlated with devolutions and 3 other fields | High correlation |
avg_recency_days is highly correlated with recency_p | High correlation |
avg_basket_size is highly correlated with devolutions and 3 other fields | High correlation |
purchases is highly correlated with quantity_p and 3 other fields | High correlation |
devolutions is highly correlated with recency_d and 2 other fields | High correlation |
recency_p is highly correlated with invoices_p and 1 other fields | High correlation |
recency_d is highly correlated with devolutions and 2 other fields | High correlation |
quantity_p is highly correlated with purchases and 2 other fields | High correlation |
quantity_d is highly correlated with devolutions and 2 other fields | High correlation |
invoices_p is highly correlated with purchases and 4 other fields | High correlation |
invoices_d is highly correlated with devolutions and 2 other fields | High correlation |
avg_recency_days is highly correlated with recency_p and 1 other fields | High correlation |
avg_basket_size is highly correlated with purchases and 2 other fields | High correlation |
avg_variety is highly correlated with purchases and 1 other fields | High correlation |
purchases_pday is highly correlated with invoices_p | High correlation |
purchases is highly correlated with quantity_p and 2 other fields | High correlation |
devolutions is highly correlated with recency_d and 2 other fields | High correlation |
recency_p is highly correlated with invoices_p and 1 other fields | High correlation |
recency_d is highly correlated with devolutions and 2 other fields | High correlation |
quantity_p is highly correlated with purchases and 1 other fields | High correlation |
quantity_d is highly correlated with devolutions and 2 other fields | High correlation |
invoices_p is highly correlated with purchases and 1 other fields | High correlation |
invoices_d is highly correlated with devolutions and 2 other fields | High correlation |
avg_recency_days is highly correlated with recency_p | High correlation |
avg_basket_size is highly correlated with purchases and 2 other fields | High correlation |
avg_variety is highly correlated with avg_basket_size | High correlation |
invoices_d is highly correlated with invoices_p and 1 other fields | High correlation |
invoices_p is highly correlated with invoices_d and 1 other fields | High correlation |
avg_basket_size is highly correlated with purchases and 4 other fields | High correlation |
recency_p is highly correlated with customer_id and 1 other fields | High correlation |
customer_id is highly correlated with recency_p and 1 other fields | High correlation |
purchases is highly correlated with invoices_d and 6 other fields | High correlation |
devolutions is highly correlated with avg_basket_size and 4 other fields | High correlation |
quantity_p is highly correlated with avg_basket_size and 4 other fields | High correlation |
avg_ticket is highly correlated with avg_basket_size and 4 other fields | High correlation |
quantity_d is highly correlated with avg_basket_size and 4 other fields | High correlation |
avg_recency_days is highly correlated with recency_p and 2 other fields | High correlation |
recency_d is highly correlated with avg_recency_days | High correlation |
purchases is highly skewed (γ1 = 21.77363976) | Skewed |
devolutions is highly skewed (γ1 = 50.91642437) | Skewed |
quantity_p is highly skewed (γ1 = 35.09784254) | Skewed |
quantity_d is highly skewed (γ1 = 53.23013972) | Skewed |
avg_ticket is highly skewed (γ1 = 51.96108487) | Skewed |
avg_basket_size is highly skewed (γ1 = 49.85733829) | Skewed |
customer_id has unique values | Unique |
purchases has 215 (3.6%) zeros | Zeros |
devolutions has 4201 (70.4%) zeros | Zeros |
quantity_p has 215 (3.6%) zeros | Zeros |
quantity_d has 4201 (70.4%) zeros | Zeros |
invoices_p has 215 (3.6%) zeros | Zeros |
invoices_d has 4201 (70.4%) zeros | Zeros |
avg_ticket has 215 (3.6%) zeros | Zeros |
avg_basket_size has 215 (3.6%) zeros | Zeros |
avg_variety has 215 (3.6%) zeros | Zeros |
purchases_pday has 215 (3.6%) zeros | Zeros |
Reproduction
| Analysis started | 2021-06-05 20:45:49.054417 |
|---|---|
| Analysis finished | 2021-06-05 20:46:18.248381 |
| Duration | 29.19 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 5971 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16765.63189 |
| Minimum | 12346 |
|---|---|
| Maximum | 22709 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 93.3 KiB |
Quantile statistics
| Minimum | 12346 |
|---|---|
| 5-th percentile | 12711 |
| Q1 | 14369.5 |
| median | 16392 |
| Q3 | 19241.5 |
| 95-th percentile | 21892.5 |
| Maximum | 22709 |
| Range | 10363 |
| Interquartile range (IQR) | 4872 |
Descriptive statistics
| Standard deviation | 2882.537033 |
|---|---|
| Coefficient of variation (CV) | 0.1719313088 |
| Kurtosis | -0.9581116481 |
| Mean | 16765.63189 |
| Median Absolute Deviation (MAD) | 2180 |
| Skewness | 0.3706824238 |
| Sum | 100107588 |
| Variance | 8309019.745 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 16384 | 1 | < 0.1% |
| 15665 | 1 | < 0.1% |
| 15677 | 1 | < 0.1% |
| 21767 | 1 | < 0.1% |
| 17722 | 1 | < 0.1% |
| 15673 | 1 | < 0.1% |
| 13322 | 1 | < 0.1% |
| 17718 | 1 | < 0.1% |
| 15669 | 1 | < 0.1% |
| 21484 | 1 | < 0.1% |
| Other values (5961) | 5961 |
| Value | Count | Frequency (%) |
| 12346 | 1 | |
| 12347 | 1 | |
| 12348 | 1 | |
| 12349 | 1 | |
| 12350 | 1 | |
| 12352 | 1 | |
| 12353 | 1 | |
| 12354 | 1 | |
| 12355 | 1 | |
| 12356 | 1 |
| Value | Count | Frequency (%) |
| 22709 | 1 | |
| 22708 | 1 | |
| 22707 | 1 | |
| 22706 | 1 | |
| 22705 | 1 | |
| 22704 | 1 | |
| 22700 | 1 | |
| 22699 | 1 | |
| 22696 | 1 | |
| 22695 | 1 |
purchases
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWEDZEROS| Distinct | 5552 |
|---|---|
| Distinct (%) | 93.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1785.401859 |
| Minimum | 0 |
|---|---|
| Maximum | 280206.02 |
| Zeros | 215 |
| Zeros (%) | 3.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 93.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3.75 |
| Q1 | 206.585 |
| median | 599.97 |
| Q3 | 1588.97 |
| 95-th percentile | 5393.625 |
| Maximum | 280206.02 |
| Range | 280206.02 |
| Interquartile range (IQR) | 1382.385 |
Descriptive statistics
| Standard deviation | 7789.345865 |
|---|---|
| Coefficient of variation (CV) | 4.362796995 |
| Kurtosis | 620.2010104 |
| Mean | 1785.401859 |
| Median Absolute Deviation (MAD) | 495.15 |
| Skewness | 21.77363976 |
| Sum | 10660634.5 |
| Variance | 60673909 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 215 | 3.6% |
| 7.95 | 9 | 0.2% |
| 1.25 | 8 | 0.1% |
| 2.95 | 8 | 0.1% |
| 4.95 | 8 | 0.1% |
| 12.75 | 7 | 0.1% |
| 1.65 | 7 | 0.1% |
| 3.75 | 7 | 0.1% |
| 7.5 | 6 | 0.1% |
| 4.25 | 6 | 0.1% |
| Other values (5542) | 5690 |
| Value | Count | Frequency (%) |
| 0 | 215 | |
| 0.42 | 1 | < 0.1% |
| 0.55 | 1 | < 0.1% |
| 0.65 | 1 | < 0.1% |
| 0.79 | 1 | < 0.1% |
| 0.84 | 3 | 0.1% |
| 0.85 | 3 | 0.1% |
| 1.07 | 1 | < 0.1% |
| 1.1 | 1 | < 0.1% |
| 1.25 | 8 | 0.1% |
| Value | Count | Frequency (%) |
| 280206.02 | 1 | |
| 259657.3 | 1 | |
| 194550.79 | 1 | |
| 168472.5 | 1 | |
| 143825.06 | 1 | |
| 124914.53 | 1 | |
| 117379.63 | 1 | |
| 91062.38 | 1 | |
| 81024.84 | 1 | |
| 77183.6 | 1 |
devolutions
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWEDZEROS| Distinct | 1325 |
|---|---|
| Distinct (%) | 22.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 150.19206 |
| Minimum | 0 |
|---|---|
| Maximum | 168469.6 |
| Zeros | 4201 |
| Zeros (%) | 70.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 93.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 9.5 |
| 95-th percentile | 219.265 |
| Maximum | 168469.6 |
| Range | 168469.6 |
| Interquartile range (IQR) | 9.5 |
Descriptive statistics
| Standard deviation | 2602.83106 |
|---|---|
| Coefficient of variation (CV) | 17.33001771 |
| Kurtosis | 3072.075434 |
| Mean | 150.19206 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 50.91642437 |
| Sum | 896796.79 |
| Variance | 6774729.525 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 4201 | |
| 12.75 | 22 | 0.4% |
| 4.95 | 19 | 0.3% |
| 15 | 17 | 0.3% |
| 9.95 | 15 | 0.3% |
| 5.9 | 13 | 0.2% |
| 25.5 | 11 | 0.2% |
| 4.25 | 10 | 0.2% |
| 3.75 | 9 | 0.2% |
| 19.9 | 9 | 0.2% |
| Other values (1315) | 1645 | 27.5% |
| Value | Count | Frequency (%) |
| 0 | 4201 | |
| 0.42 | 2 | < 0.1% |
| 0.65 | 1 | < 0.1% |
| 0.77 | 1 | < 0.1% |
| 0.95 | 1 | < 0.1% |
| 1 | 1 | < 0.1% |
| 1.25 | 6 | 0.1% |
| 1.45 | 4 | 0.1% |
| 1.64 | 1 | < 0.1% |
| 1.65 | 5 | 0.1% |
| Value | Count | Frequency (%) |
| 168469.6 | 1 | |
| 77183.6 | 1 | |
| 39267 | 1 | |
| 30032.23 | 1 | |
| 22998.4 | 1 | |
| 17836.46 | 1 | |
| 16888.02 | 1 | |
| 16453.71 | 1 | |
| 13541.33 | 2 | |
| 13474.79 | 1 |
| Distinct | 304 |
|---|---|
| Distinct (%) | 5.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 126.0015073 |
| Minimum | 0 |
|---|---|
| Maximum | 373 |
| Zeros | 38 |
| Zeros (%) | 0.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 93.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 24 |
| median | 77 |
| Q3 | 215 |
| 95-th percentile | 365 |
| Maximum | 373 |
| Range | 373 |
| Interquartile range (IQR) | 191 |
Descriptive statistics
| Standard deviation | 118.7308916 |
|---|---|
| Coefficient of variation (CV) | 0.9422973912 |
| Kurtosis | -0.8103832322 |
| Mean | 126.0015073 |
| Median Absolute Deviation (MAD) | 67 |
| Skewness | 0.7488472968 |
| Sum | 752355 |
| Variance | 14097.02462 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 365 | 234 | 3.9% |
| 1 | 110 | 1.8% |
| 4 | 105 | 1.8% |
| 3 | 99 | 1.7% |
| 2 | 92 | 1.5% |
| 10 | 86 | 1.4% |
| 8 | 82 | 1.4% |
| 9 | 80 | 1.3% |
| 17 | 79 | 1.3% |
| 7 | 78 | 1.3% |
| Other values (294) | 4926 |
| Value | Count | Frequency (%) |
| 0 | 38 | 0.6% |
| 1 | 110 | |
| 2 | 92 | |
| 3 | 99 | |
| 4 | 105 | |
| 5 | 52 | |
| 7 | 78 | |
| 8 | 82 | |
| 9 | 80 | |
| 10 | 86 |
| Value | Count | Frequency (%) |
| 373 | 23 | 0.4% |
| 372 | 23 | 0.4% |
| 371 | 17 | 0.3% |
| 369 | 4 | 0.1% |
| 368 | 13 | 0.2% |
| 367 | 18 | 0.3% |
| 366 | 15 | 0.3% |
| 365 | 234 | |
| 364 | 11 | 0.2% |
| 362 | 7 | 0.1% |
| Distinct | 281 |
|---|---|
| Distinct (%) | 4.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 297.9606431 |
| Minimum | 0 |
|---|---|
| Maximum | 373 |
| Zeros | 5 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 93.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 24 |
| Q1 | 282 |
| median | 365 |
| Q3 | 365 |
| 95-th percentile | 365 |
| Maximum | 373 |
| Range | 373 |
| Interquartile range (IQR) | 83 |
Descriptive statistics
| Standard deviation | 120.0996751 |
|---|---|
| Coefficient of variation (CV) | 0.4030722777 |
| Kurtosis | 0.4613608502 |
| Mean | 297.9606431 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -1.463353946 |
| Sum | 1779123 |
| Variance | 14423.93195 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 365 | 4215 | |
| 8 | 45 | 0.8% |
| 64 | 39 | 0.7% |
| 46 | 31 | 0.5% |
| 21 | 31 | 0.5% |
| 35 | 28 | 0.5% |
| 3 | 28 | 0.5% |
| 9 | 27 | 0.5% |
| 25 | 23 | 0.4% |
| 29 | 22 | 0.4% |
| Other values (271) | 1482 | 24.8% |
| Value | Count | Frequency (%) |
| 0 | 5 | 0.1% |
| 1 | 20 | |
| 2 | 13 | 0.2% |
| 3 | 28 | |
| 4 | 15 | 0.3% |
| 5 | 4 | 0.1% |
| 7 | 10 | 0.2% |
| 8 | 45 | |
| 9 | 27 | |
| 10 | 8 | 0.1% |
| Value | Count | Frequency (%) |
| 373 | 1 | < 0.1% |
| 372 | 8 | 0.1% |
| 371 | 2 | < 0.1% |
| 369 | 2 | < 0.1% |
| 368 | 9 | 0.2% |
| 367 | 11 | 0.2% |
| 366 | 7 | 0.1% |
| 365 | 4215 | |
| 364 | 2 | < 0.1% |
| 362 | 2 | < 0.1% |
quantity_p
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWEDZEROS| Distinct | 817 |
|---|---|
| Distinct (%) | 13.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 254.9107352 |
| Minimum | 0 |
|---|---|
| Maximum | 80996 |
| Zeros | 215 |
| Zeros (%) | 3.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 93.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 34 |
| median | 91 |
| Q3 | 200 |
| 95-th percentile | 631.5 |
| Maximum | 80996 |
| Range | 80996 |
| Interquartile range (IQR) | 166 |
Descriptive statistics
| Standard deviation | 1701.491022 |
|---|---|
| Coefficient of variation (CV) | 6.674850396 |
| Kurtosis | 1502.456269 |
| Mean | 254.9107352 |
| Median Absolute Deviation (MAD) | 70 |
| Skewness | 35.09784254 |
| Sum | 1522072 |
| Variance | 2895071.697 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 295 | 4.9% |
| 0 | 215 | 3.6% |
| 3 | 117 | 2.0% |
| 6 | 53 | 0.9% |
| 28 | 50 | 0.8% |
| 21 | 48 | 0.8% |
| 16 | 48 | 0.8% |
| 67 | 42 | 0.7% |
| 52 | 41 | 0.7% |
| 36 | 41 | 0.7% |
| Other values (807) | 5021 |
| Value | Count | Frequency (%) |
| 0 | 215 | |
| 1 | 295 | |
| 2 | 26 | 0.4% |
| 3 | 117 | 2.0% |
| 4 | 26 | 0.4% |
| 5 | 21 | 0.4% |
| 6 | 53 | 0.9% |
| 7 | 25 | 0.4% |
| 8 | 19 | 0.3% |
| 9 | 17 | 0.3% |
| Value | Count | Frequency (%) |
| 80996 | 1 | |
| 74215 | 1 | |
| 38639 | 1 | |
| 21352 | 1 | |
| 17376 | 1 | |
| 17150 | 1 | |
| 16288 | 1 | |
| 15853 | 1 | |
| 13369 | 1 | |
| 12872 | 1 |
quantity_d
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWEDZEROS| Distinct | 189 |
|---|---|
| Distinct (%) | 3.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 40.92095126 |
| Minimum | -0 |
|---|---|
| Maximum | 80995 |
| Zeros | 4201 |
| Zeros (%) | 70.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 93.3 KiB |
Quantile statistics
| Minimum | -0 |
|---|---|
| 5-th percentile | -0 |
| Q1 | -0 |
| median | -0 |
| Q3 | 1 |
| 95-th percentile | 28 |
| Maximum | 80995 |
| Range | 80995 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1435.845873 |
|---|---|
| Coefficient of variation (CV) | 35.08828189 |
| Kurtosis | 2885.966697 |
| Mean | 40.92095126 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 53.23013972 |
| Sum | 244339 |
| Variance | 2061653.371 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| -0 | 4201 | |
| 1 | 512 | 8.6% |
| 3 | 174 | 2.9% |
| 2 | 92 | 1.5% |
| 6 | 90 | 1.5% |
| 4 | 77 | 1.3% |
| 5 | 46 | 0.8% |
| 12 | 46 | 0.8% |
| 7 | 42 | 0.7% |
| 8 | 40 | 0.7% |
| Other values (179) | 651 | 10.9% |
| Value | Count | Frequency (%) |
| -0 | 4201 | |
| 1 | 512 | 8.6% |
| 2 | 92 | 1.5% |
| 3 | 174 | 2.9% |
| 4 | 77 | 1.3% |
| 5 | 46 | 0.8% |
| 6 | 90 | 1.5% |
| 7 | 42 | 0.7% |
| 8 | 40 | 0.7% |
| 9 | 37 | 0.6% |
| Value | Count | Frequency (%) |
| 80995 | 1 | |
| 74215 | 1 | |
| 9361 | 1 | |
| 9014 | 1 | |
| 4873 | 1 | |
| 4027 | 1 | |
| 2399 | 1 | |
| 2302 | 1 | |
| 2160 | 1 | |
| 1685 | 1 |
invoices_p
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONZEROS| Distinct | 60 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.339976553 |
| Minimum | 0 |
|---|---|
| Maximum | 209 |
| Zeros | 215 |
| Zeros (%) | 3.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 93.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 4 |
| 95-th percentile | 11 |
| Maximum | 209 |
| Range | 209 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 6.736955851 |
|---|---|
| Coefficient of variation (CV) | 2.01706681 |
| Kurtosis | 316.395646 |
| Mean | 3.339976553 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 13.45819778 |
| Sum | 19943 |
| Variance | 45.38657414 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 2916 | |
| 2 | 831 | 13.9% |
| 3 | 508 | 8.5% |
| 4 | 387 | 6.5% |
| 5 | 242 | 4.1% |
| 0 | 215 | 3.6% |
| 6 | 172 | 2.9% |
| 7 | 143 | 2.4% |
| 8 | 98 | 1.6% |
| 9 | 68 | 1.1% |
| Other values (50) | 391 | 6.5% |
| Value | Count | Frequency (%) |
| 0 | 215 | 3.6% |
| 1 | 2916 | |
| 2 | 831 | 13.9% |
| 3 | 508 | 8.5% |
| 4 | 387 | 6.5% |
| 5 | 242 | 4.1% |
| 6 | 172 | 2.9% |
| 7 | 143 | 2.4% |
| 8 | 98 | 1.6% |
| 9 | 68 | 1.1% |
| Value | Count | Frequency (%) |
| 209 | 1 | |
| 201 | 1 | |
| 124 | 1 | |
| 97 | 1 | |
| 93 | 1 | |
| 91 | 1 | |
| 86 | 1 | |
| 73 | 1 | |
| 63 | 1 | |
| 62 | 1 |
invoices_d
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONZEROS| Distinct | 27 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.6421035003 |
| Minimum | 0 |
|---|---|
| Maximum | 47 |
| Zeros | 4201 |
| Zeros (%) | 70.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 93.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 3 |
| Maximum | 47 |
| Range | 47 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.861996514 |
|---|---|
| Coefficient of variation (CV) | 2.899838598 |
| Kurtosis | 174.8938214 |
| Mean | 0.6421035003 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 10.06208452 |
| Sum | 3834 |
| Variance | 3.467031018 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 4201 | |
| 1 | 1068 | 17.9% |
| 2 | 308 | 5.2% |
| 3 | 147 | 2.5% |
| 4 | 97 | 1.6% |
| 5 | 44 | 0.7% |
| 6 | 30 | 0.5% |
| 7 | 22 | 0.4% |
| 8 | 9 | 0.2% |
| 9 | 7 | 0.1% |
| Other values (17) | 38 | 0.6% |
| Value | Count | Frequency (%) |
| 0 | 4201 | |
| 1 | 1068 | 17.9% |
| 2 | 308 | 5.2% |
| 3 | 147 | 2.5% |
| 4 | 97 | 1.6% |
| 5 | 44 | 0.7% |
| 6 | 30 | 0.5% |
| 7 | 22 | 0.4% |
| 8 | 9 | 0.2% |
| 9 | 7 | 0.1% |
| Value | Count | Frequency (%) |
| 47 | 1 | |
| 45 | 1 | |
| 35 | 1 | |
| 31 | 1 | |
| 27 | 1 | |
| 23 | 1 | |
| 21 | 1 | |
| 19 | 2 | |
| 18 | 1 | |
| 17 | 2 |
| Distinct | 5576 |
|---|---|
| Distinct (%) | 93.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 60.59449396 |
| Minimum | 0 |
|---|---|
| Maximum | 77183.6 |
| Zeros | 215 |
| Zeros (%) | 3.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 93.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1.918888889 |
| Q1 | 7.920322992 |
| median | 15.7246 |
| Q3 | 22.25928571 |
| 95-th percentile | 79.2140625 |
| Maximum | 77183.6 |
| Range | 77183.6 |
| Interquartile range (IQR) | 14.33896272 |
Descriptive statistics
| Standard deviation | 1274.292949 |
|---|---|
| Coefficient of variation (CV) | 21.02984719 |
| Kurtosis | 2881.449309 |
| Mean | 60.59449396 |
| Median Absolute Deviation (MAD) | 7.4314 |
| Skewness | 51.96108487 |
| Sum | 361809.7234 |
| Variance | 1623822.519 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 215 | 3.6% |
| 3.75 | 11 | 0.2% |
| 4.95 | 10 | 0.2% |
| 2.95 | 9 | 0.2% |
| 1.25 | 9 | 0.2% |
| 7.95 | 8 | 0.1% |
| 12.75 | 7 | 0.1% |
| 8.25 | 7 | 0.1% |
| 1.65 | 7 | 0.1% |
| 5.95 | 6 | 0.1% |
| Other values (5566) | 5682 |
| Value | Count | Frequency (%) |
| 0 | 215 | |
| 0.42 | 2 | < 0.1% |
| 0.535 | 1 | < 0.1% |
| 0.55 | 1 | < 0.1% |
| 0.65 | 1 | < 0.1% |
| 0.79 | 1 | < 0.1% |
| 0.8371428571 | 1 | < 0.1% |
| 0.84 | 2 | < 0.1% |
| 0.85 | 3 | 0.1% |
| 1.002222222 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 77183.6 | 1 | |
| 56157.5 | 1 | |
| 13541.33 | 1 | |
| 13305.5 | 1 | |
| 11062.06 | 1 | |
| 4453.43 | 1 | |
| 4287.63 | 1 | |
| 3861 | 1 | |
| 3096 | 1 | |
| 2653.95 | 1 |
avg_recency_days
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 1280 |
|---|---|
| Distinct (%) | 21.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 120.87658 |
| Minimum | 0 |
|---|---|
| Maximum | 373 |
| Zeros | 4 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 93.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 14 |
| Q1 | 39.63333333 |
| median | 80 |
| Q3 | 182.5 |
| 95-th percentile | 338 |
| Maximum | 373 |
| Range | 373 |
| Interquartile range (IQR) | 142.8666667 |
Descriptive statistics
| Standard deviation | 102.9113998 |
|---|---|
| Coefficient of variation (CV) | 0.8513758398 |
| Kurtosis | -0.2161395638 |
| Mean | 120.87658 |
| Median Absolute Deviation (MAD) | 53.3 |
| Skewness | 0.9688375366 |
| Sum | 721754.0593 |
| Variance | 10590.75621 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 46 | 37 | 0.6% |
| 53 | 34 | 0.6% |
| 39 | 33 | 0.6% |
| 28 | 32 | 0.5% |
| 60 | 32 | 0.5% |
| 353 | 31 | 0.5% |
| 213 | 29 | 0.5% |
| 367 | 28 | 0.5% |
| 106 | 28 | 0.5% |
| 184 | 28 | 0.5% |
| Other values (1270) | 5659 |
| Value | Count | Frequency (%) |
| 0 | 4 | 0.1% |
| 1 | 11 | |
| 2 | 7 | |
| 2.554794521 | 1 | < 0.1% |
| 3 | 14 | |
| 3.243478261 | 1 | < 0.1% |
| 3.300884956 | 1 | < 0.1% |
| 3.333333333 | 1 | < 0.1% |
| 3.5 | 1 | < 0.1% |
| 3.666666667 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 373 | 21 | |
| 372 | 22 | |
| 371 | 18 | |
| 369 | 4 | 0.1% |
| 368 | 14 | |
| 367 | 28 | |
| 366 | 13 | |
| 365 | 19 | |
| 364 | 11 | 0.2% |
| 362 | 7 | 0.1% |
avg_basket_size
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWEDZEROS| Distinct | 2372 |
|---|---|
| Distinct (%) | 39.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 254.1652064 |
| Minimum | 0 |
|---|---|
| Maximum | 74215 |
| Zeros | 215 |
| Zeros (%) | 3.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 93.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 64 |
| median | 142.6666667 |
| Q3 | 282 |
| 95-th percentile | 718 |
| Maximum | 74215 |
| Range | 74215 |
| Interquartile range (IQR) | 218 |
Descriptive statistics
| Standard deviation | 1170.070864 |
|---|---|
| Coefficient of variation (CV) | 4.603583947 |
| Kurtosis | 2915.667275 |
| Mean | 254.1652064 |
| Median Absolute Deviation (MAD) | 98.66666667 |
| Skewness | 49.85733829 |
| Sum | 1517620.447 |
| Variance | 1369065.827 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 215 | 3.6% |
| 1 | 168 | 2.8% |
| 2 | 72 | 1.2% |
| 3 | 54 | 0.9% |
| 4 | 51 | 0.9% |
| 5 | 36 | 0.6% |
| 6 | 28 | 0.5% |
| 12 | 26 | 0.4% |
| 73 | 21 | 0.4% |
| 100 | 21 | 0.4% |
| Other values (2362) | 5279 |
| Value | Count | Frequency (%) |
| 0 | 215 | |
| 1 | 168 | |
| 1.5 | 1 | < 0.1% |
| 2 | 72 | 1.2% |
| 3 | 54 | 0.9% |
| 3.333333333 | 1 | < 0.1% |
| 4 | 51 | 0.9% |
| 5 | 36 | 0.6% |
| 5.333333333 | 1 | < 0.1% |
| 5.666666667 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 74215 | 1 | |
| 40498.5 | 1 | |
| 14149 | 1 | |
| 13956 | 1 | |
| 7824 | 1 | |
| 6009.333333 | 1 | |
| 5964 | 1 | |
| 5198 | 1 | |
| 4300 | 1 | |
| 4280 | 1 |
| Distinct | 1279 |
|---|---|
| Distinct (%) | 21.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38.20798644 |
| Minimum | 0 |
|---|---|
| Maximum | 1114 |
| Zeros | 215 |
| Zeros (%) | 3.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 93.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 7.5 |
| median | 17 |
| Q3 | 34 |
| 95-th percentile | 171 |
| Maximum | 1114 |
| Range | 1114 |
| Interquartile range (IQR) | 26.5 |
Descriptive statistics
| Standard deviation | 75.8309033 |
|---|---|
| Coefficient of variation (CV) | 1.984687244 |
| Kurtosis | 33.55299436 |
| Mean | 38.20798644 |
| Median Absolute Deviation (MAD) | 11.75 |
| Skewness | 5.086636807 |
| Sum | 228139.887 |
| Variance | 5750.325895 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 331 | 5.5% |
| 0 | 215 | 3.6% |
| 2 | 164 | 2.7% |
| 3 | 114 | 1.9% |
| 13 | 102 | 1.7% |
| 10 | 97 | 1.6% |
| 14 | 96 | 1.6% |
| 4 | 96 | 1.6% |
| 5 | 95 | 1.6% |
| 9 | 94 | 1.6% |
| Other values (1269) | 4567 |
| Value | Count | Frequency (%) |
| 0 | 215 | |
| 1 | 331 | |
| 1.2 | 1 | < 0.1% |
| 1.25 | 1 | < 0.1% |
| 1.333333333 | 2 | < 0.1% |
| 1.5 | 9 | 0.2% |
| 1.555555556 | 1 | < 0.1% |
| 1.571428571 | 1 | < 0.1% |
| 1.666666667 | 4 | 0.1% |
| 1.833333333 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1114 | 1 | |
| 749 | 1 | |
| 731 | 1 | |
| 721 | 1 | |
| 705 | 1 | |
| 687 | 1 | |
| 676 | 1 | |
| 675 | 1 | |
| 662 | 1 | |
| 651 | 1 |
| Distinct | 1243 |
|---|---|
| Distinct (%) | 20.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.5312110211 |
| Minimum | 0 |
|---|---|
| Maximum | 17 |
| Zeros | 215 |
| Zeros (%) | 3.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 93.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.007782218992 |
| Q1 | 0.02285714286 |
| median | 0.6666666667 |
| Q3 | 1 |
| 95-th percentile | 1 |
| Maximum | 17 |
| Range | 17 |
| Interquartile range (IQR) | 0.9771428571 |
Descriptive statistics
| Standard deviation | 0.5507143751 |
|---|---|
| Coefficient of variation (CV) | 1.03671489 |
| Kurtosis | 132.9126315 |
| Mean | 0.5312110211 |
| Median Absolute Deviation (MAD) | 0.5 |
| Skewness | 4.713845887 |
| Sum | 3171.861007 |
| Variance | 0.3032863229 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 2925 | |
| 0 | 215 | 3.6% |
| 2 | 50 | 0.8% |
| 0.0625 | 18 | 0.3% |
| 0.02777777778 | 17 | 0.3% |
| 0.02380952381 | 17 | 0.3% |
| 0.09090909091 | 15 | 0.3% |
| 0.08333333333 | 14 | 0.2% |
| 0.02941176471 | 13 | 0.2% |
| 0.07692307692 | 13 | 0.2% |
| Other values (1233) | 2674 |
| Value | Count | Frequency (%) |
| 0 | 215 | |
| 0.005449591281 | 1 | < 0.1% |
| 0.005464480874 | 1 | < 0.1% |
| 0.005479452055 | 1 | < 0.1% |
| 0.005494505495 | 1 | < 0.1% |
| 0.005586592179 | 2 | < 0.1% |
| 0.005602240896 | 1 | < 0.1% |
| 0.005617977528 | 2 | < 0.1% |
| 0.00566572238 | 1 | < 0.1% |
| 0.005681818182 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 17 | 1 | < 0.1% |
| 4 | 2 | < 0.1% |
| 3 | 4 | 0.1% |
| 2 | 50 | 0.8% |
| 1.142857143 | 1 | < 0.1% |
| 1 | 2925 | |
| 0.75 | 1 | < 0.1% |
| 0.6666666667 | 4 | 0.1% |
| 0.5588235294 | 1 | < 0.1% |
| 0.5388739946 | 1 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| customer_id | purchases | devolutions | recency_p | recency_d | quantity_p | quantity_d | invoices_p | invoices_d | avg_ticket | avg_recency_days | avg_basket_size | avg_variety | purchases_pday | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 17850 | 5391.21 | 102.58 | 372.0 | 302.0 | 35.0 | 21.0 | 34.0 | 1.0 | 18.152222 | 124.333333 | 50.970588 | 8.735294 | 17.000000 |
| 1 | 13047 | 3237.54 | 158.44 | 31.0 | 31.0 | 132.0 | 6.0 | 10.0 | 8.0 | 18.822907 | 26.642857 | 139.100000 | 17.200000 | 0.029155 |
| 2 | 12583 | 7281.38 | 94.04 | 2.0 | 56.0 | 1569.0 | 50.0 | 15.0 | 3.0 | 29.479271 | 20.722222 | 337.333333 | 16.466667 | 0.040323 |
| 3 | 13748 | 948.25 | 0.00 | 95.0 | 365.0 | 169.0 | -0.0 | 5.0 | 0.0 | 33.866071 | 93.250000 | 87.800000 | 5.600000 | 0.017921 |
| 4 | 15100 | 876.00 | 240.90 | 333.0 | 330.0 | 48.0 | 22.0 | 3.0 | 3.0 | 292.000000 | 62.166667 | 26.666667 | 1.000000 | 0.073171 |
| 5 | 15291 | 4668.30 | 71.79 | 25.0 | 172.0 | 508.0 | 27.0 | 15.0 | 5.0 | 45.323301 | 21.941176 | 140.200000 | 6.866667 | 0.042980 |
| 6 | 14688 | 5630.87 | 523.49 | 7.0 | 7.0 | 579.0 | 281.0 | 21.0 | 6.0 | 17.219786 | 17.761905 | 172.428571 | 15.571429 | 0.057221 |
| 7 | 17809 | 5411.91 | 784.29 | 16.0 | 16.0 | 961.0 | 41.0 | 12.0 | 3.0 | 88.719836 | 31.083333 | 171.416667 | 5.083333 | 0.033520 |
| 8 | 15311 | 60767.90 | 1348.56 | 0.0 | 0.0 | 2167.0 | 231.0 | 91.0 | 27.0 | 25.543464 | 4.098901 | 419.714286 | 26.142857 | 0.243316 |
| 9 | 14527 | 8508.82 | 797.44 | 2.0 | 8.0 | 198.0 | 3.0 | 55.0 | 31.0 | 8.753930 | 5.828125 | 37.981818 | 17.672727 | 0.149457 |
Last rows
| customer_id | purchases | devolutions | recency_p | recency_d | quantity_p | quantity_d | invoices_p | invoices_d | avg_ticket | avg_recency_days | avg_basket_size | avg_variety | purchases_pday | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5961 | 22700 | 4839.42 | 0.0 | 1.0 | 365.0 | 917.0 | -0.0 | 1.0 | 0.0 | 78.055161 | 1.0 | 1074.0 | 62.0 | 1.0 |
| 5962 | 13298 | 360.00 | 0.0 | 1.0 | 365.0 | 96.0 | -0.0 | 1.0 | 0.0 | 180.000000 | 1.0 | 96.0 | 2.0 | 1.0 |
| 5963 | 14569 | 227.39 | 0.0 | 1.0 | 365.0 | 70.0 | -0.0 | 1.0 | 0.0 | 18.949167 | 1.0 | 79.0 | 12.0 | 1.0 |
| 5964 | 22704 | 17.90 | 0.0 | 1.0 | 365.0 | 2.0 | -0.0 | 1.0 | 0.0 | 2.557143 | 1.0 | 14.0 | 7.0 | 1.0 |
| 5965 | 22705 | 3.35 | 0.0 | 1.0 | 365.0 | 1.0 | -0.0 | 1.0 | 0.0 | 1.675000 | 1.0 | 2.0 | 2.0 | 1.0 |
| 5966 | 22706 | 6637.59 | 0.0 | 1.0 | 365.0 | 430.0 | -0.0 | 1.0 | 0.0 | 10.452898 | 1.0 | 1748.0 | 635.0 | 1.0 |
| 5967 | 22707 | 7689.23 | 0.0 | 0.0 | 365.0 | 347.0 | -0.0 | 1.0 | 0.0 | 10.518782 | 0.0 | 2011.0 | 731.0 | 1.0 |
| 5968 | 22708 | 3217.20 | 0.0 | 0.0 | 365.0 | 524.0 | -0.0 | 1.0 | 0.0 | 54.528814 | 0.0 | 654.0 | 59.0 | 1.0 |
| 5969 | 22709 | 5664.89 | 0.0 | 0.0 | 365.0 | 211.0 | -0.0 | 1.0 | 0.0 | 25.985734 | 0.0 | 732.0 | 218.0 | 1.0 |
| 5970 | 12713 | 848.55 | 0.0 | 0.0 | 365.0 | 101.0 | -0.0 | 1.0 | 0.0 | 22.330263 | 0.0 | 508.0 | 38.0 | 1.0 |